CAMLS: A Constraint-Based Apriori Algorithm for Mining Long Sequences
نویسندگان
چکیده
Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAMLS, Constraint-based Apriori Mining of Long Sequences, an efficient algorithm for mining long sequential patterns under constraints. CAMLS is based on the apriori property and consists of two phases, event-wise and sequence-wise, which employ an iterative process of candidate-generation followed by frequency-testing. The separation into these two phases allows us to: (i) introduce a novel candidate pruning strategy that increases the efficiency of the mining process and (ii) easily incorporate considerations of intra-event and inter-event constraints. Experiments on both synthetic and real datasets show that CAMLS outperforms previous algorithms when mining long sequences.
منابع مشابه
A Review Paper on Sequential Pattern Mining Algorithms
Sequential pattern mining and sequential rules mining are important data mining task for wide application. Its use to find frequently occurring ordered events or sub sequence as pattern from sequence database. Sequence can be called as order list of event. If one item set is completely subset of another item set is called sub sequence. Sequential pattern mining is used in various domains such a...
متن کاملDiscovering Active and Profitable Patterns with Rfm (recency, Frequency and Monetary) Sequential Pattern Mining–a Constraint Based Approach
Sequential pattern mining is an extension of association rule mining that discovers time-related behaviors in sequence database. It extends association by adding time to the transactions. The problem of finding association rules concern with intratransaction patterns whereas that of sequential pattern mining concerns with inter-transaction patterns. Generalized Sequential Pattern (GSP) mining a...
متن کاملObjective-Oriented Utility-Based Association Mining
The necessity to develop methods for discovering association patterns to increase business utility of an enterprise has long been recognized in data mining community. This requires modeling specific association patterns that are both statistically (based on support and confidence) and semantically (based on objective utility) relating to a given objective that a user wants to achieve or is inte...
متن کاملAn Algorithm for Mining Large Sequences in Databases
Frequent sequence mining is a fundamental and essential operation in the process of discovering the sequential rules. Most of the sequence mining algorithms use apriori methodology or build the larger sequences from smaller patterns, a bottom-up approach. In this paper, we present an algorithm that uses top-down approach for mining long sequences. Our algorithm defines dominancy of the sequence...
متن کاملMining association rules with multiple minimum supports using maximum constraints
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most of the previous approaches set a single minimum support threshold for all the items or itemsets. But in real applications, different items may have different criteria to judge its importance. The support requirements should then vary with different items. In t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010